Skip to content
This repository has been archived by the owner on Jan 29, 2024. It is now read-only.

Custom canonical ul without html extension #1696

Closed
wants to merge 3 commits into from

Conversation

angelinekwan
Copy link
Collaborator

@angelinekwan angelinekwan commented Jan 16, 2023

What changed, and why it matters

In all pages, the canonical tags of docs.aiven.io are pointing to redirected URL versions ending with .html. For example, https://docs.aiven.io/docs/tools/api has a canonical to https://docs.aiven.io/docs/tools/api.html. This creates a redirect-canonical loop and makes the pages non-indexable. The canonical tags need to be the version without .html. So the canonical in the example should be https://docs.aiven.io/docs/tools/api

Resolving this issue

Solution

Instead of using default html_baseurl, set custom canonical tag in base template to have control of the pagename.

Demo

https://b0372b27.devportal.pages.dev/docs/tools/api
Screenshot 2023-01-16 at 12 11 07

@lornajane
Copy link
Contributor

I'm confused about why this PR doesn't have a preview URL for us to check?

@angelinekwan
Copy link
Collaborator Author

angelinekwan commented Jan 17, 2023

I'm confused about why this PR doesn't have a preview URL for us to check?

Not sure neither. I got the link from the Checks earlier but is now missing as well. I added preview url in the PR description under Demo. https://b0372b27.devportal.pages.dev/

@lornajane
Copy link
Contributor

I can see where we are going with this but the html_baseurl configuration is used by the sitemap, which now doesn't show URLs. I think we should try not to break the sitemap when fixing the canonical URLs.

@angelinekwan
Copy link
Collaborator Author

angelinekwan commented Jan 17, 2023

I can see where we are going with this but the html_baseurl configuration is used by the sitemap, which now doesn't show URLs. I think we should try not to break the sitemap when fixing the canonical URLs.

You are right - checking from the sitemap the links still have .html in the page url and missing domain.

I am not aware where else in the sphinx build uses html_baseurl, as the documentation only mentioned -> It is used to indicate the location of document using The Canonical Link Relation.

Do you have other suggestion to tackle this issue?
I tried to remove .html extension on sphinx-build. Reference
-b dirhtml -> Build HTML pages, but with a single directory per document. Makes for prettier URLs (no .html) if served from a webserver but the canonical url will still have .html (it is a known bug).

@lornajane lornajane removed their request for review January 18, 2023 15:22
@angelinekwan
Copy link
Collaborator Author

Not a good solution since removing html_baseurl will cause the sitemap issue without hostname. The sitemap links should also not have the .html extension

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants